Data integration with ICESat-2 - Part I
Contents
Data integration with ICESat-2 - Part I¶
Learning Objectives
Goals
Identify and locate non-ICESat-2 data sets
Acquiring data from the cloud or via download
Open data in Pandas and Xarray and basic functioning of DataFrames
☝️ This formatting is a Jupyter Book admonition, that uses a custom version of Markdown called MyST
Needed libraries:¶
satsearch
boto
matplotlib widget extension
pip install sat-search
Collecting sat-search
Downloading sat-search-0.3.0.tar.gz (10 kB)
Preparing metadata (setup.py) ... ?25l-
\
|
/
done
?25hRequirement already satisfied: sat-stac~=0.4.0 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from sat-search) (0.4.1)
Requirement already satisfied: python-dateutil~=2.7.5 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from sat-stac~=0.4.0->sat-search) (2.7.5)
Requirement already satisfied: requests>=2.19.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from sat-stac~=0.4.0->sat-search) (2.27.1)
Requirement already satisfied: six>=1.5 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from python-dateutil~=2.7.5->sat-stac~=0.4.0->sat-search) (1.16.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from requests>=2.19.1->sat-stac~=0.4.0->sat-search) (2.0.12)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from requests>=2.19.1->sat-stac~=0.4.0->sat-search) (1.26.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from requests>=2.19.1->sat-stac~=0.4.0->sat-search) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from requests>=2.19.1->sat-stac~=0.4.0->sat-search) (3.3)
Building wheels for collected packages: sat-search
Building wheel for sat-search (setup.py) ... ?25l-
\
|
done
?25h Created wheel for sat-search: filename=sat_search-0.3.0-py3-none-any.whl size=9257 sha256=bd9ebffacb1f6d32908b91ea15f38e80e4b85d7e0fb704ec26d918572835ad7a
Stored in directory: /home/runner/.cache/pip/wheels/a6/90/2b/9f732379b612b7c9a5b7b84058b3f3e1cbae17c1d89b5310e5
Successfully built sat-search
Installing collected packages: sat-search
Successfully installed sat-search-0.3.0
Note: you may need to restart the kernel to use updated packages.
pip install boto3
Requirement already satisfied: boto3 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (1.20.24)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from boto3) (0.10.0)
Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from boto3) (0.5.2)
Requirement already satisfied: botocore<1.24.0,>=1.23.24 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from boto3) (1.23.24)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from botocore<1.24.0,>=1.23.24->boto3) (1.26.8)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from botocore<1.24.0,>=1.23.24->boto3) (2.7.5)
Requirement already satisfied: six>=1.5 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.24.0,>=1.23.24->boto3) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
pip install intake-stac
Requirement already satisfied: intake-stac in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (0.0.0)
Requirement already satisfied: intake-xarray>=0.4 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake-stac) (0.6.0)
Requirement already satisfied: fsspec>=0.8.4 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake-stac) (2022.2.0)
Requirement already satisfied: intake>=0.5.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake-stac) (0.6.5)
Requirement already satisfied: sat-stac==0.4.* in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake-stac) (0.4.1)
Requirement already satisfied: requests>=2.19.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from sat-stac==0.4.*->intake-stac) (2.27.1)
Requirement already satisfied: python-dateutil~=2.7.5 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from sat-stac==0.4.*->intake-stac) (2.7.5)
Requirement already satisfied: jinja2 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake>=0.5.1->intake-stac) (3.0.3)
Requirement already satisfied: appdirs in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake>=0.5.1->intake-stac) (1.4.4)
Requirement already satisfied: dask in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake>=0.5.1->intake-stac) (2022.2.1)
Requirement already satisfied: pyyaml in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake>=0.5.1->intake-stac) (5.4.1)
Requirement already satisfied: entrypoints in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake>=0.5.1->intake-stac) (0.4)
Requirement already satisfied: msgpack in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake-xarray>=0.4->intake-stac) (1.0.3)
Requirement already satisfied: netcdf4 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake-xarray>=0.4->intake-stac) (1.5.8)
Requirement already satisfied: zarr in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake-xarray>=0.4->intake-stac) (2.11.1)
Requirement already satisfied: xarray>=0.17.0 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from intake-xarray>=0.4->intake-stac) (0.21.1)
Requirement already satisfied: packaging>=20.0 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from dask->intake>=0.5.1->intake-stac) (21.3)
Requirement already satisfied: toolz>=0.8.2 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from dask->intake>=0.5.1->intake-stac) (0.11.2)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from dask->intake>=0.5.1->intake-stac) (2.0.0)
Requirement already satisfied: partd>=0.3.10 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from dask->intake>=0.5.1->intake-stac) (1.2.0)
Requirement already satisfied: six>=1.5 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from python-dateutil~=2.7.5->sat-stac==0.4.*->intake-stac) (1.16.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from requests>=2.19.1->sat-stac==0.4.*->intake-stac) (2.0.12)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from requests>=2.19.1->sat-stac==0.4.*->intake-stac) (1.26.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from requests>=2.19.1->sat-stac==0.4.*->intake-stac) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from requests>=2.19.1->sat-stac==0.4.*->intake-stac) (3.3)
Requirement already satisfied: pandas>=1.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from xarray>=0.17.0->intake-xarray>=0.4->intake-stac) (1.4.1)
Requirement already satisfied: numpy>=1.18 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from xarray>=0.17.0->intake-xarray>=0.4->intake-stac) (1.22.3)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from jinja2->intake>=0.5.1->intake-stac) (2.1.0)
Requirement already satisfied: cftime in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from netcdf4->intake-xarray>=0.4->intake-stac) (1.6.0)
Requirement already satisfied: fasteners in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from zarr->intake-xarray>=0.4->intake-stac) (0.17.3)
Requirement already satisfied: asciitree in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from zarr->intake-xarray>=0.4->intake-stac) (0.3.3)
Requirement already satisfied: numcodecs>=0.6.4 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from zarr->intake-xarray>=0.4->intake-stac) (0.9.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from packaging>=20.0->dask->intake>=0.5.1->intake-stac) (3.0.7)
Collecting pandas>=1.1
Downloading pandas-1.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/11.7 MB ? eta -:--:--
━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.5/11.7 MB 14.6 MB/s eta 0:00:01
━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/11.7 MB 16.1 MB/s eta 0:00:01
━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/11.7 MB 17.3 MB/s eta 0:00:01
━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.8/11.7 MB 20.1 MB/s eta 0:00:01
━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/11.7 MB 23.2 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━ 5.8/11.7 MB 27.3 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 8.0/11.7 MB 32.4 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 8.9/11.7 MB 31.4 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━ 11.1/11.7 MB 39.1 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 11.7/11.7 MB 42.2 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.7/11.7 MB 36.1 MB/s eta 0:00:00
?25h
Requirement already satisfied: pytz>=2020.1 in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from pandas>=1.1->xarray>=0.17.0->intake-xarray>=0.4->intake-stac) (2021.3)
Downloading pandas-1.3.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB)
?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/11.5 MB ? eta -:--:--
━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.7/11.5 MB 79.6 MB/s eta 0:00:01
━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.9/11.5 MB 56.9 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 5.8/11.5 MB 62.1 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━ 8.5/11.5 MB 60.4 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 11.5/11.5 MB 63.3 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.5/11.5 MB 50.4 MB/s eta 0:00:00
?25h
Requirement already satisfied: locket in /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages (from partd>=0.3.10->dask->intake>=0.5.1->intake-stac) (0.2.0)
Installing collected packages: pandas
Attempting uninstall: pandas
Found existing installation: pandas 1.4.1
Uninstalling pandas-1.4.1:
Successfully uninstalled pandas-1.4.1
Successfully installed pandas-1.3.5
Note: you may need to restart the kernel to use updated packages.
Computing environment¶
We’ll be using the following open source Python libraries in this notebook:
import ipyleaflet
from ipyleaflet import Map, GeoData, LayersControl,Rectangle, basemaps, basemap_to_tiles, TileLayer, SplitMapControl, Polygon
import ipywidgets
import datetime
import re
# %matplotlib widget
import satsearch
from satsearch import Search
import geopandas as gpd
import ast
import pandas as pd
import geoviews as gv
import hvplot.pandas
from ipywidgets import interact
from IPython.display import display, Image
import intake # if you've installed intake-STAC, it will automatically import alongside intake
import xarray as xr
import matplotlib.pyplot as plt
import boto3
import rasterio as rio
from rasterio.session import AWSSession
from rasterio.plot import show
from dask.utils import SerializableLock
import os
import hvplot.xarray
import numpy as np
from pyproj import Proj, transform
# Suppress library deprecation warnings
import warnings
warnings.filterwarnings('ignore')
Identify and acquire the ICESat2 product(s) of interest¶
What is the application of this product?
What region and resolution is needed?
Download ICESat-2 ATL03 data from desired region¶
Remember icepyx? We are going to use that again to download some ICESat-2 ATL06 data over our region of interest.
import icepyx as ipx
# Specifying the necessary icepyx parameters
short_name = 'ATL06'
spatial_extent = 'hackweek_kml_jakobshavan.kml' # KML polygon centered on Jakobshavan
date_range = ['2019-04-01', '2019-04-30']
rgts = ['338'] # IS-2 RGT of interest
You may notice that we specified a RGT track. As seen below, a large number of ICESat-2 overpasses occur for Jakobshavan. In the interest of time (and computer memory), we are going to look at only one of these tracks.
# Show image of area of interest (data viz tutorial will get in deeper so don't explain much):
center = [69.2, -50]
zoom = 7
# Open KML file for visualizing
gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'
jk = gpd.read_file(spatial_extent, driver='KML')
m = Map(basemap=basemap_to_tiles(basemaps.NASAGIBS.ModisAquaTrueColorCR, '2020-07-18'),center=center,zoom=zoom)
geo_data = GeoData(geo_dataframe = jk)
m.add_layer(geo_data)
m.add_control(LayersControl())
m
# Setup the Query object
region = ipx.Query(short_name, spatial_extent, date_range, tracks=rgts)
# Show the available granules
region.avail_granules(ids=True)
[['ATL06_20190420093051_03380303_005_01.h5']]
Looks like we have an ICESat-2 track! Let’s quickly visualize the data to ensure that there are no clouds.
# Request information from OpenAltimetry
cyclemap, rgtmap = region.visualize_elevation()
rgtmap
Generating urls
/usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages/icepyx/core/validate_inputs.py:71: UserWarning: Listed cycle is not presently available
warnings.warn("Listed cycle is not presently available")
Sending request to OpenAltimetry, please wait...
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:01<00:00, 1.60s/it]
100%|██████████| 1/1 [00:01<00:00, 1.60s/it]
Plot elevation, please wait...
Looks good! Now it’s time to download the data.
# Set Earthdata credentials
uid = 'zhfair'
email = 'zhfair@umich.edu'
region.earthdata_login(uid, email)
# Order the granules
region.order_granules()
/usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages/icepyx/core/Earthdata.py:154: DeprecationWarning: invalid escape sequence \.
"""
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
File /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages/icepyx/core/Earthdata.py:131, in Earthdata.login(self, attempts)
130 url = "urs.earthdata.nasa.gov"
--> 131 self.uid, _, self.pswd = netrc.netrc(self.netrc).authenticators(url)
132 session = self._start_session()
File /usr/share/miniconda3/envs/hackweek/lib/python3.9/netrc.py:29, in netrc.__init__(self, file)
28 self.macros = {}
---> 29 with open(file) as fp:
30 self._parse(file, fp, default_netrc)
FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/.netrc'
During handling of the above exception, another exception occurred:
StdinNotImplementedError Traceback (most recent call last)
Input In [11], in <cell line: 4>()
2 uid = 'zhfair'
3 email = 'zhfair@umich.edu'
----> 4 region.earthdata_login(uid, email)
6 # Order the granules
7 region.order_granules()
File /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages/icepyx/core/query.py:851, in Query.earthdata_login(self, uid, email, s3token)
846 assert (
847 is_ec2() == True
848 ), "You must be working from a valid AWS instance to use s3 data access"
849 capability_url = "https://data.nsidc.earthdatacloud.nasa.gov/s3credentials"
--> 851 self._session = Earthdata(uid, email, capability_url).login()
853 # DevNote: might make sense to do this part elsewhere in the future, but wanted to get it implemented for now
854 if s3token == True:
File /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages/icepyx/core/Earthdata.py:137, in Earthdata.login(self, attempts)
134 except:
135 # if not using an environmental variable for password
136 if not self.pswd:
--> 137 self.pswd = getpass.getpass("Earthdata Login password: ")
138 for i in range(attempts):
139 try:
File /usr/share/miniconda3/envs/hackweek/lib/python3.9/site-packages/ipykernel/kernelbase.py:1047, in Kernel.getpass(self, prompt, stream)
1040 """Forward getpass to frontends
1041
1042 Raises
1043 ------
1044 StdinNotImplementedError if active frontend doesn't support stdin.
1045 """
1046 if not self._allow_stdin:
-> 1047 raise StdinNotImplementedError(
1048 "getpass was called, but this frontend does not support input requests."
1049 )
1050 if stream is not None:
1051 import warnings
StdinNotImplementedError: getpass was called, but this frontend does not support input requests.
# Download the data
path = '/home/jovyan/website2022/book/tutorials/DataIntegration/'
region.download_granules(path)
import h5py
# Load the ICESat-2 data. We will just look at the central beams (GT2R/L)
is2_file = 'processed_ATL06_20190420093051_03380303_005_01_full.h5'
with h5py.File(is2_file, 'r') as f:
is2_gt2r = pd.DataFrame(data={'lat': f['gt2r/land_ice_segments/latitude'][:],
'lon': f['gt2r/land_ice_segments/longitude'][:],
'elev': f['gt2r/land_ice_segments/h_li'][:]})
is2_gt2l = pd.DataFrame(data={'lat': f['gt2l/land_ice_segments/latitude'][:],
'lon': f['gt2l/land_ice_segments/longitude'][:],
'elev': f['gt2l/land_ice_segments/h_li'][:]})
Identify other products of interest, and their space/time boundaries¶
What data are needed to supplement and enhance IS2 observations?¶
Satellite or in situ data sets?
We will show ATM (non-AWS) and Landsat (AWS)
Where are the other data sets stored?¶
Cloud datasets in AWS https://registry.opendata.aws/
ATM: https://search.earthdata.nasa.gov/search/granules?p=C1000000062-NSIDC_ECS&pg[0][v]=f&pg[0][gsk]=-start_date&sb[0]=-50.56835,68.83371,-49.00139,69.56591&fi=ATM&as[instrument][0]=ATM&tl=1646767454.667!3!!&m=67.08956509568404!-48.11572265625!6!1!0!0,2
Sentinel-2: https://registry.opendata.aws/sentinel-2/
Landsat: https://registry.opendata.aws/usgs-landsat/
Acquire non-cloud data and open: ATM data access¶
Why did we choose April 2019 and RGT 338? In Spring 2019, an airborne campaign called Operation IceBridge was flown across Jakobshavan as validation for ICESat-2. Onboard was the Airborne Topographic Mapper, a lidar that works at both 532 nm (like ICESat-2) and 1064 nm (near-infrared). More information about Operation IceBridge and ATM may be found here: https://nsidc.org/data/icebridge
Here, we are going to try and co-register ATM spot measurements with ICESat-2. Because both data sets are rather large, this can be computationally expensive, so we will only consider one flight track with the ATM 532 nm beam.
Operation IceBridge data is not available on the cloud, so this data was downloaded directly from NSIDC. If you are interested in using IceBridge data, NSIDC has a useful data portal here: https://nsidc.org/icebridge/portal/map
# Load the ATM data into a DataFrame
atm_file = 'ILATM2_20190506_151600_smooth_nadir3seg_50pt.csv'
atm_l2 = pd.read_csv(atm_file)
atm_l2.head()
Opening and manipulating data in Pandas¶
atm_l2 = atm_l2[atm_l2['Track_Identifier']==0]
# Change the longitudes to be consistent with ICESat-2
atm_l2['Longitude(deg)'] -= 360
Pandas excels at helping you explore, clean, and process tabular data, such as data stored in spreadsheets or databases. In pandas, a Series is a 1-D data table and the 2-D data table is called a DataFrame, which we saw just above.
Read csv is the easiest way to open a csv data file into a pandas DataFrame. We can specify formatting, data selection, indexing and much more when reading any data into a pandas DataFrame. Here I specify different headers, assign the first column as the index, and the first column as the header (even though we are renaming it, so that it doesn’t get read in with data).
# Load data with specific headers
headers = ['UTC', 'Lat', 'Lon',
'Height', 'South-to-North_Slope',
'West-to-East_Slope', 'RMS_Fit', 'Number_Measurements',
'Number_Measurements_Removed',
'Distance_Of_Block_To_The_Right_Of_Aircraft', 'Track_Identifier']
atm_l2 = pd.read_csv(atm_file,names=headers,index_col=0,header=0)
atm_l2.head()
# Subset the ICESat-2 data to the ATM latitudes
is2_gt2r = is2_gt2r[(is2_gt2r['lat']<atm_l2['Latitude(deg)'].max()) & (is2_gt2r['lat']>atm_l2['Latitude(deg)'].min())]
is2_gt2l = is2_gt2l[(is2_gt2l['lat']<atm_l2['Latitude(deg)'].max()) & (is2_gt2l['lat']>atm_l2['Latitude(deg)'].min())]
# Set up a map with the flight tracks as overlays
from ipyleaflet import Map, basemaps, basemap_to_tiles, Polyline
m = Map(
basemap=basemap_to_tiles(basemaps.Esri.WorldImagery),
center=(69.25, 310.35-360),
zoom=11
)
gt2r_line = Polyline(
locations=[
[is2_gt2r['lat'].min(), is2_gt2r['lon'].max()],
[is2_gt2r['lat'].max(), is2_gt2r['lon'].min()]
],
color="green" ,
fill=False
)
m.add_layer(gt2r_line)
gt2l_line = Polyline(
locations=[
[is2_gt2l['lat'].min(), is2_gt2l['lon'].max()],
[is2_gt2l['lat'].max(), is2_gt2l['lon'].min()]
],
color="green" ,
fill=False
)
m.add_layer(gt2l_line)
atm_line = Polyline(
locations=[
[atm_l2['Latitude(deg)'].min(), atm_l2['Longitude(deg)'].max()],
[atm_l2['Latitude(deg)'].max(), atm_l2['Longitude(deg)'].min()]
],
color="orange" ,
fill=False
)
m.add_layer(atm_line)
m
Now we can explore the data and DataFrame functions
atm_l2.columns
atm_l2['Lat'].head()
atm_l2.Lat.head()
If we want something more intuitive as an index
# Add datetime for index
atm_l2 = atm_l2.reset_index()
date = pd.to_datetime(atm_file[7:15])
time = pd.to_timedelta(atm_l2.UTC, unit='s')
atm_l2['DateTime'] = date + time
atm_l2 = atm_l2.set_index('DateTime')
atm_l2.head()
Now we can easily slice data by date, month, day, etc.
atm_l2['2019-05-06'].head()
Or by a range of dates
atm_l2['2019-05-06':'2019-05-07']
Slicing¶
Two methods for slicing:
.loc for label-based indexing
.iloc for positional indexing
# Use loc to slice specific indices
atm_l2.loc[['2019-05-06 15:16:09.500', '2019-05-06 15:16:09.750']]
# Select indices and columns
atm_l2.loc[['2019-05-06 15:16:09.500', '2019-05-06 15:16:09.750'], ['Lat', 'Lon', 'Height']]
# Use iloc to select by row and column number
atm_l2.iloc[[0,4],[0,1,2]]
Statistical manipulations¶
atm_l2.mean()
atm_l2['Height'].describe()
# Group rows together by a column and take the mean of each group
atm_l2.groupby('Track_Identifier').mean()
atm_l2.resample('30S').sum()
One alternative to using a loop to iterate over a DataFrame is to use the pandas apply() method. This function acts as a map() function in Python. It takes a function as an input and applies this function to an entire DataFrame.If you are working with tabular data, you must specify an axis you want your function to act on (0 for columns; and 1 for rows). Much like the map() function, the apply() method can also be used with anonymous functions or lambda functions. Let’s look at some apply() examples using baseball data.
First, you will call the .apply() method on the atm_l2 dataframe. Then use the lambda function to iterate over the rows of the dataframe. For every row, we grab the Number_Measurements and Number_Measurements_Removed columns and pass them to the calc_total function. Finally, you will specify the axis=1 to tell the .apply() method that we want to apply it on the rows instead of columns.
def calc_total(a,b):
total = a + b
return total
atm_l2['Total_Measurements'] = atm_l2.apply(lambda row: calc_total(row['Number_Measurements'], row['Number_Measurements_Removed']), axis=1)
atm_l2.head()
Exercise¶
Try making apply work with this new function to create a new column, ‘Distance_to_Jakobshavn’, by using the Lat and Lon as inputs.
from math import sin, cos, sqrt, atan2, radians
def distance(a,b):
'''
Calculate distance between a set point and a lat lon pair
a = lat
b = lon
'''
jlat,jlon = 69.2330, -49.2434
# approximate radius of earth in km
R = 6373.0
lat1 = radians(jlat)
lon1 = radians(jlon)
lat2 = radians(a)
lon2 = radians(b)
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = R * c
return distance
atm_l2['Distance_to_Jakobshavn']
# = atm_l2.apply(lambda row: calc_distance(row['Lat'], row['Lon']), axis=1)
# atm_l2.head()
Reset index to start fresh for multi-indexing
atm_l2 = atm_l2.reset_index()
atm_l2.head()
Multi-indexing¶
atm_l2 = atm_l2.set_index(['DateTime','Track_Identifier']).sort_index()
atm_l2
To clear up some terminology, the levels of a MultiIndex are the former column names (UTC, Track_Identifier). The labels are the actual values in a level, (54969.50, 0-?, …). Levels can be referred to by name or position, with 0 being the outermost level.
Slicing the outermost index level is pretty easy, we just use our regular .loc[row_indexer, column_indexer]. We’ll select the columns Lat and Long where the UTC was —.
atm_l2.loc[['2019-05-06 15:16:09.500','2019-05-06 15:16:09.750'],['Lat','Lon']]
If you wanted to select the rows whose track was 0 or 1, .loc wants [row_indexer, column_indexer] so let’s wrap the two elements of our row indexer (the list of UTCs and the Tracks) in a tuple to make it a single unit:
atm_l2.loc[(['2019-05-06 15:16:09.500','2019-05-06 15:16:09.750'],[0,1]),['Lat','Lon']]
Now you want info from any UTC but for the specific tracks again (0,1). Below the : indicates all labels in this level.
atm_l2.loc[pd.IndexSlice[:,[0,1]], ['Lat','Lon']]
Many attributes and functions to explore with pandas
atm_l2.
Download (Landsat) images from the cloud:¶
An Amazon Web Services (AWS) account is required to directly access Landsat data in the cloud housed within the usgs-landsat S3 requester pays bucket. Users can also find all objects within the Landsat record by using either the SpatioTemporal Asset Catalog (STAC) Browser (s3) or Satellite (SAT) Application Programming Interface (API) (https).
Scene location - https://landsat-pds.s3.amazonaws.com/c1/L8/202/025/LC08_L1TP_202025_20190420_20190507_01_T1/
Scene band url for download - https://landsat-pds.s3.amazonaws.com/c1/L8/202/025/LC08_L1TP_202025_20190420_20190507_01_T1/LC08_L1TP_202025_20190420_20190507_01_T1_B4.TIF
***If you want to only open one Landsat band and not talk too much about COGs: http://www.acgeospatial.co.uk/cog-part-2-python/
# landsatband = []
# landsat_fp = 'http://landsat-pds.s3.amazonaws.com/c1/L8/202/025/LC08_L1TP_202025_20190420_20190507_01_T1/LC08_L1TP_202025_20190420_20190507_01_T1_'
# bands=('4.TIF', '3.TIF', '2.TIF')
# for i in range(1,11): ## 11 bands of Landsat
# band = landsat_fp+'B'+str(i)+'.TIF'
# if band.endswith((bands)):
# landsatband.append(band)
# landsatband.sort(reverse=True) ## because I want the order 4,3,2 (but change if you need to!)
# print(landsatband)
Using data in the cloud from S3 buckets:¶
Landsat S3 User Manual: https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/LSDS-2032-Landsat-Commercial-Cloud-Direct-Access-Users-Guide-v2.pdf.pdf
Scene band url for access - s3://usgs-landsat/collection02/level-2/standard/oli-tirs/2020/202/025/LC08_L1TP_202025_20190420_20190507_01_T1/LC08_L1TP_202025_20190420_20190507_01_T1_B4.TIF
Find and open Landsat images in the cloud¶
How can data formats enable high-speed access?
Web Object Storage (e.g., Simple Storage Service )
Inexpensive storage, BUT…
…access is via HTTP1, not random disk reads
Does include HTTP “range-get” to retrieve a specific byte range
Cloud optimized approaches
Organize data as an aggregation of small, independently retrievable objects (e.g., zarr, HDF2, Geotiff in the Cloud)
Allow access to pieces of large objects (e.g., Cloud-Optimized GeoTIFF3,OPeNDAP4 in the Cloud)
We will be working with Cloud Optimized GeoTIFF (COG). A COG is a GeoTIFF file with an internal organization that enables more efficient workflows in the cloud environment. It does this by leveraging the ability of clients issuing HTTP GET range requests to ask for just the parts of a file they need instead of having to open the entire image or data set.
To explore and access COG’s easily we will use a SpatioTemporal Asset Catalog (STAC). The STAC specification provides a common, machine-readable (JSON) format for describing a wide range of geospatial datasets. STAC’s goal is to make it easier to index and discover any geospatial dataset that can be described by a spatial extent and time. STAC is the way geospatial asset metadata is structured and queried and it makes querrying S3 buckets easier. Learn more here: https://github.com/radiantearth/stac-spec.
# Sets up credentials for acquiring images through dask/xarray
os.environ["AWS_REQUEST_PAYER"] = "requester"
# Need to set up proper credentials for acquiring data through rasterio
aws_session = AWSSession(boto3.Session(), requester_pays=True)
# Extract geometry bounds for Landsat spatial bbox
geom = jk.geometry[0]
print(geom.bounds)
We will search for imagery in STAC catalog using satsearch: https://github.com/sat-utils/sat-search
# Search STAC API for Landsat images based on a bounding box, date and other metadata if desired
bbox = (-51.3229, 68.840, -48.203, 69.616) #(west, south, east, north)
timeRange = '2019-05-06/2019-05-07'
results = Search.search(url='https://ibhoyw8md9.execute-api.us-west-2.amazonaws.com/prod',
collection='usgs-landsat/collection02/',
datetime=timeRange,
bbox=bbox,
# properties=properties,
sort=['<datetime'])
print('%s items' % results.found())
items = results.items()
# Save search to geojson file
gjson_outfile = f'/home/jovyan/website2022/book/tutorials/DataIntegration/Landsat.geojson'
items.save(gjson_outfile)
We are given a satsearch collection of items (images)
items
Load the geojson file into geopandas and inspect the items we want to collect
# Load the geojson file
gf = gpd.read_file(gjson_outfile)
gf.head(2)
# Plot search area of interest and frames on a map using Holoviz Libraries (more on these later)
cols = gf.loc[:,('id','landsat:wrs_path','landsat:wrs_row','geometry')]
footprints = cols.hvplot(geo=True, line_color='k', hover_cols=['landsat:wrs_path','landsat:wrs_row'], alpha=0.2, title='Landsat 8 T1',tiles='ESRI')
tiles = gv.tile_sources.CartoEco.options(width=700, height=500)
labels = gv.tile_sources.StamenLabels.options(level='annotation')
tiles * footprints * labels
Intake all scenes using the intake-STAC library¶
Intake-STAC facilitates discovering, exploring, and loading spatio-temporal datasets.
Intake-stac provides Intake Drivers for SpatioTemporal Asset Catalogs (STAC). It provides tools for opening STAC Catalogs, Collections, ItemCollections, and Items as Intake catalogs. By combining Intake and sat-stac, Intake-stac provides a simple toolkit for working with STAC catalogs and for loading STAC assets as Xarray objects.
catalog = intake.open_stac_item_collection(items)
list(catalog)
Let’s explore the metadata and keys for the first scene
sceneid = list(catalog)[0]
item0 = catalog[sceneid]
item0.metadata
# for keys in item0.keys():
# print (keys)
We can explore the metadata for any of these:
item0['blue'].metadata
# This is the url needed to grab data from the S3 bucket
# Using the satsearch catalog:
items[0].asset('blue')['alternate']['s3']['href'] # can use item asset name or title (blue or B2)
# Using the intake-STAC catalog
item0.blue.metadata['alternate']['s3']['href'] # must use item asset name (blue)
Should I keep next cell?¶
# Retrieve first scene using rio
item_url = item0.blue.metadata['alternate']['s3']['href']
# Read and plot with grid coordinates
with rio.Env(aws_session):
with rio.open(item_url) as src:
fig, ax = plt.subplots(figsize=(9,8))
# To plot
show(src,1)
# To open data into a numpy array
profile = src.profile
arr = src.read(1)
Manipulating data in Xarray¶
Tasha
Poll: *
Pandas and Xarray have very similar structures and ways of manipulating data, but Pandas excels with 2-D data and Xarray is ideal for more dimensions. Xarray introduces labels in the form of dimensions, coordinates and attributes on top of Pandas-like DataFrames.
Xarray doesn’t just keep track of labels on arrays – it uses them to provide a powerful and concise interface.
Read in 3 bands of a scene to xarray
sceneid = catalog[sceneids[0]]
print (sceneid.name)
band_names = ['red','green','blue']
bands = []
# Construct xarray for scene
for band_name in band_names:
# Specify chunk size (x,y), Landsat COG is in 512 chunks so can use this or a multiple
band = sceneid[band_name](chunks=dict(band=1, x=512, y=512),urlpath=sceneid[band_name].metadata['alternate']['s3']['href']).to_dask()
band['band'] = [band_name]
bands.append(band)
scene = xr.concat(bands, dim='band')
scene
Similarly as with pandas, you can explore variables easily. This time you can work with coordinates (equivalent to indices in pandas):
scene.x
Or keep track of arbitrary metadata (called attributes) in the form of a Python dictionary:
scene.attrs
scene.crs
You can apply operations over dimensions by name:
scene.sel(band='blue')
Mathematical operations (e.g., x - y) vectorize across multiple dimensions (array broadcasting) based on dimension names, not shape.
scene.sel(band='blue').sum()#.values
Easily use the split-apply-combine paradigm with groupby:
scene.groupby('band').mean()
Database-like alignment based on coordinate labels that smoothly handles missing values:
The N-dimensional nature of xarray’s data structures makes it suitable for dealing with multi-dimensional scientific data, and its use of dimension names instead of axis labels (dim=’time’ instead of axis=0) makes such arrays much more manageable than the raw numpy ndarray: with xarray, you don’t need to keep track of the order of an array’s dimensions or insert dummy dimensions of size 1 to align arrays (e.g., using np.newaxis).
The immediate payoff of using xarray is that you’ll write less code. The long-term payoff is that you’ll understand what you were thinking when you come back to look at it weeks or months later.
Now let’s open all the bands and multiple days together into an xarray
# Load dask, suppress warnings, each worker needs to run as requester pays
from dask.distributed import Client
import logging
client = Client(processes=True, n_workers=4,
threads_per_worker=1,
silence_logs=logging.ERROR)
client.run(lambda: os.environ["AWS_REQUEST_PAYER"] == "requester" )
client
sceneids = list(catalog)
sceneids
Let’s create the time variable first for the xarray time dimension
# Create time variable for time dim
time_var = xr.Variable('time',gf.loc[gf.id.isin([sceneid for sceneid in sceneids if sceneid.endswith('_T1')])]['datetime'])
Now we will search and collect band names for grabbing each desired band. We will just grab the bands that have 30 m pixels.
band_names = []
# Get band names
sceneid = catalog[sceneids[0]]
for k in sceneid.keys():
M = getattr(sceneid, k).metadata
if 'eo:bands' in M:
resol = M['eo:bands'][0]['gsd']
print(k, resol)
if resol >= 30:
band_names.append(k)
# Add qa band
band_names.append('qa_pixel')
# Import to xarray
# In xarray dataframe nans are in locations where concat of multiple scenes has expanded the grid
scenes = []
for sceneid in sceneids:
if sceneid.endswith('_T1'):
sceneid = catalog[sceneid]
print (sceneid.name)
bands = []
# Construct xarray for scene
for band_name in band_names:
band = sceneid[band_name](chunks=dict(band=1, x=2048, y=2048),urlpath=sceneid[band_name].metadata['alternate']['s3']['href']).to_dask() # Specify chunk size, landsat is prob in 512 chunks so used multiple
band['band'] = [band_name]
bands.append(band)
scene = xr.concat(bands, dim='band')
scenes.append(scene)
# Concatenate scenes with time variable ***This is the slowest
ls_scenes = xr.concat(scenes, dim=time_var)
pix = ls_scenes.transform[0]
epsg = ls_scenes.crs
ls_scenes
We now have 2 Landsat scenes with all of the bands we are interested in stored in an xarray.
We can also easily subset and visualize:
sbands = ['blue', 'nir08', 'swir16','cirrus', 'lwir11']
sub_box = 1
# for t in thw.time.values:
t = ls_scene.time.values[0]
# Further subset to polynya location if desired
pulx = -1590700
puly = -428200
plrx = -1584000
plry = -433600
image = ls_scene.sel(time=t,band=sbands,y=slice(plry,puly),x=slice(pulx,plrx))
# For use at the end for coordinates
pol_y = image.y
pol_x = image.x
image = np.array(image.where(image.notnull(),0))
image = np.moveaxis(image, 0, -1)
n_band = image.shape[2]
print (t)
fig, (ax,ax1) = plt.subplots(ncols=2,figsize=(6,4))
ax.imshow(image[:,:,0])
ax.title.set_text('Subset window')
(thw.sel(time=t,band='blue')).plot(ax=ax1)
Summary¶
You have seen how we can search for non-ICESat_2 datasets, open them into pandas and xarray dataframes, and manipulate/explore the dataframes.
Note
You may have noticed Jupyter Book adds some extra formatting features that do not necessarily render as you might expect when executing a noteook in Jupyter Lab. This “admonition” note is one such example.
Warning
Jupyter Book is very particular about Markdown header ordering to automatically create table of contents on the website. In this tutorial we are careful to use a single main header (#) and sequential subheaders (#, ##, ###, etc.)
References¶
To further explore the topics of this tutorial see the following detailed documentation: